110 research outputs found

    ANNIS: a linguistic database for exploring information structure

    Get PDF
    In this paper, we discuss the design and implementation of our first version of the database "ANNIS" (ANNotation of Information Structure). For research based on empirical data, ANNIS provides a uniform environment for storing this data together with its linguistic annotations. A central database promotes standardized annotation, which facilitates interpretation and comparison of the data. ANNIS is used through a standard web browser and offers tier-based visualization of data and annotations, as well as search facilities that allow for cross-level and cross-sentential queries. The paper motivates the design of the system, characterizes its user interface, and provides an initial technical evaluation of ANNIS with respect to data size and query processing

    Linguistic mechanisms of coherence in aphasic and non-aphasic discourse

    Get PDF
    Background: Coherence is the quality that distinguishes discourse from a random collection of sentences. People with aphasia have been reported to produce less-coherent discourse than non-language-impaired speakers. It is largely unclear how coherence is established in natural language and what leads to its impairment in aphasia.Aims: This paper presents a cross-methodological investigation on coherence in the discourse of Russian native speakers with and without aphasia. The purpose of this study was to examine the connection between language impairments in aphasia and different aspects of discourse coherence in order to determine the linguistic mechanisms that could be involved in establishing and maintaining it.Methods &amp; Procedures: Coherence was operationalised as a combination of four aspects: informativeness, clarity, connectedness, and understandability. Twenty participants were asked to retell the content of a short movie. The retellings were annotated using Rhetorical Structure Theory (RST), a formalistic framework for discourse-structure analysis. Next, they were evaluated for coherence on a four-point scale by trained raters. The ratings were compared between groups. A classification analysis was performed to determine whether the ratings could be predicted based on the macrolinguistic variables collected from the RST annotations and several microlinguistic variables previously linked to coherence.Results: Retellings produced by speakers with aphasia received lower ratings than those of control participants on all aspects of coherence. The results indicate that different combinations of microlinguistic and discourse-structure variables play a role in establishing each of the coherence aspects.Conclusions: Our results provided supporting evidence on coherence impairment in aphasia. Perception of a discourse as more or less coherent was associated with both microlinguistic and macrolinguistic variables, with different combinations of variables relevant for each of the aspects. Furthermore, we found that discourse structure plays an important role, especially for understandability. We speculate that pragmatic knowledge shared by interlocutors may boost the coherence of aphasic discourse.</p

    Classifying Italian newspaper text: news or editorial?

    Get PDF
    We present a text classifier that can distinguish Italian news stories from editorials. Inspired by earlier work on English, we built a suitable train/test corpus and implemented a range of features, which can predict the distinction with an accuracy of 89,12%. As demonstrated by the earlier work, such a feature-based approach outperforms simple bag-of-words models when being transferred to new domains. We argue that the technique can also be used to distinguish opinionated from non-opinionated text outside of the realm of newspapers.Presentiamo una tecnica per la classificazione di articoli di giornale in italiano come articoli di cronaca oppure editoriali. Ispirandoci a precedenti pubblicazioni riguardanti la lingua inglese, abbiamo costruito un corpus adatto allo scopo e selezionato un insieme di caratteristiche testuali in grado di distinguere il genere con un accuratezza dell’ 89,12%. Come dimostrato dai lavori precedenti, questo approccio basato sulle proprietà del testo mostra risultati migliori rispetto ad altri quando trasferito a nuovi argomenti. Riteniamo inoltre che questa tecnica possa essere usata con successo anche in contesti diversi dagli articoli di giornale per distinguere testi contenenti opinioni dell’autore e non

    Primary and secondary discourse connectives: definitions and lexicons

    Get PDF
    Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation)

    Connective-Lex: A Web-Based Multilingual Lexical Resource for Connectives

    Get PDF
    In this paper, we present a tangible outcome of the TextLink network: a joint online database project displaying and linking existing and newly-created lexicons of discourse connectives in multiple languages. We discuss the definition and demarcation of the class of connectives that should be included in such a resource, and present the syntactic, semantic/pragmatic, and lexicographic information we collected. Further, the technical implementation of the database and the search functionality are presented. We discuss how the multilingual integration of several connective lexicons provides added value for linguistic researchers and other users interested in connectives, by allowing crosslinguistic comparison and a direct linking between discourse relational devices in different languages. Finally, we provide pointers for possible future extensions both in breadth (i.e., by adding lexicons for additional languages) and depth (by extending the information provided for each connective item and by strengthening the crosslinguistic links).Nous présentons dans cet article un résultat tangible du réseau TextLink : un projet conjoint de base de données en ligne, qui montre et relie des lexiques, aussi bien existants que créés récemment, de connecteurs discursifs dans plusieurs langues. Nous commençons par considérer la définition et la délimitation de la classe des connecteurs qui devraient être inclus dans une telle ressource, et nous présentons l’information syntaxique, sémantico-pragmatique et lexicographique que nous avons recueillie. D’autre part, l’implémentation technique de cette base de données et les fonctionnalités de recherche qu’elle permet sont aussi décrites. Nous discutons de quelle manière l’intégration multilingue de plusieurs lexiques de connecteurs apporte une valeur ajoutée aux chercheurs en linguistique et aux autres utilisateurs qui s’intéressent aux connecteurs, en permettant de comparer plusieurs langues et de relier directement les connecteurs dans différentes langues. Pour finir, nous donnons des indications quant à une possible extension future en termes d’ampleur (par exemple, en ajoutant des lexiques pour de nouvelles langues) et de profondeur (en augmentant l’information qui est donnée pour chaque connecteur et en renforçant les liens entre lexiques)

    Primary and secondary discourse connectives: definitions and lexicons

    Get PDF
    Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation)
    • …
    corecore